Translating Pro-Drop Languages with Reconstruction Models

نویسندگان

  • Longyue Wang
  • Zhaopeng Tu
  • Shuming Shi
  • Tong Zhang
  • Yvette Graham
  • Qun Liu
چکیده

Pronouns are frequently omitted in pro-drop languages, such as Chinese, generally leading to significant challenges with respect to the production of complete translations. To date, very little attention has been paid to the dropped pronoun (DP) problem within neural machine translation (NMT). In this work, we propose a novel reconstruction-based approach to alleviating DP translation problems for NMT models. Firstly, DPs within all source sentences are automatically annotated with parallel information extracted from the bilingual training corpus. Next, the annotated source sentence is reconstructed from hidden representations in the NMT model. With auxiliary training objectives, in terms of reconstruction scores, the parameters associated with the NMT model are guided to produce enhanced hidden representations that are encouraged as much as possible to embed annotated DP information. Experimental results on both Chinese–English and Japanese–English dialogue translation tasks show that the proposed approach significantly and consistently improves translation performance over a strong NMT baseline, which is directly built on the training data annotated with DPs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Zero Object Resolution in Korean

 Korean is one of the well-known „pro-drop‟ languages. When translating Korean zero object into languages in which objects have to be overtly expressed, the resolution of zero object is crucial. This paper proposes a machine learning method to resolve Korean zero object. We proposed 8 linguistically motivated features for ML (Machine Learning). Our approach has been implemented with WEKA 3.6.1...

متن کامل

Co-reference Resolution of Elided Subjects and Possessive Pronouns in Spanish-English Statistical Machine Translation

This paper presents a straightforward method to integrate co-reference information into phrase-based machine translation to address the problems of i) elided subjects and ii) morphological underspecification of pronouns when translating from pro-drop languages. We evaluate the method for the language pair SpanishEnglish and find that translation quality improves with the addition of co-referenc...

متن کامل

Short time dynamics of viscous drop spreading

Related Articles Longitudinal instability of a liquid rim Phys. Fluids 25, 022103 (2013) A simple criterion for filament break-up in drop-on-demand inkjet printing Phys. Fluids 25, 021701 (2013) Breakup and coalescence characteristics of a hollow cone swirling spray Phys. Fluids 24, 124103 (2012) Coalescence of liquid drops: Different models versus experiment Phys. Fluids 24, 122105 (2012) Stab...

متن کامل

Pro-Drop and Impoverishment

It is often assumed that some notion of morphological richness plays a central role in the theory of pro-drop: In languages with sufficiently rich verbal φ-feature (person, number, gender) agreement morphology, pronominal arguments can (and, in some contexts, must) remain without phonological realization; in languages without such a rich verbal agreement morphology, pronominal arguments must be...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1801.03257  شماره 

صفحات  -

تاریخ انتشار 2018